unified game-theoretic approach
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
There has been a resurgence of interest in multiagent reinforcement learning (MARL), due partly to the recent success of deep neural networks. The simplest form of MARL is independent reinforcement learning (InRL), where each agent treats all of its experience as part of its (non stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe a meta-algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game theoretic analysis to compute meta-strategies for policy selection.
Reviews: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
Summary: "A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning" presents a novel scalable algorithm that is shown to converge to better behaviours in partially-observable Multi-Agent Reinforcement Learning scenarios compared to previous methods. The paper begins with describing the problem, mainly that training reinforcement learning agents independently (i.e. each agent ignores the behaviours of the other agents and treats them as part of the environment) results in policies which can significantly overfit to only the agent behaviours observed during training time, failing to generalize when later set against new opponent behaviours. The paper then describes its solution, a generalization of the Double Oracle algorithm. The algorithm works using the following process: first, given a set of initial policies for each player, an empirical payoff tensor is created and from that a meta-strategy is learnt for each player which is the mixture over that initial policy set which achieves the highest value. Then each player i in the game is iterated, and a new policy is trained against policies sampled from the meta-strategies of the other agents not equal to i.
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
Lanctot, Marc, Zambaldi, Vinicius, Gruslys, Audrunas, Lazaridou, Angeliki, Tuyls, Karl, Perolat, Julien, Silver, David, Graepel, Thore
There has been a resurgence of interest in multiagent reinforcement learning (MARL), due partly to the recent success of deep neural networks. The simplest form of MARL is independent reinforcement learning (InRL), where each agent treats all of its experience as part of its (non stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe a meta-algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game theoretic analysis to compute meta-strategies for policy selection.
A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning
Today we will dig into a paper ripped of A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning, one of the core ideas that has been used for the development of #AlphaStar . There are several concepts in AlphaStar that won t be treated here . The aim is to dig in the concepts that what has been as the "Nash League" conceptual functioning and how game theory came to mix with reinforcement learning . At the end of this article you should have a notion of Double Oracle algorithm, Deep Cognitive Hierarchies and Policy-Space Response Oracles . For this post you should be familiarized with some concepts about game theory, like the setup of the strategic game in form of the payoff matrix, the understanding of Nash Equilibria and best response.